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(54) www-based mail service system 

(57) In a computerized distributed mail system, a 
plurality of dient oonputers are connected to each other 
via a network. Each cUent computer Is configured to 
execute dient mail application progranna A mail service 
system is also connected to the network. The system is 
for executing server mail programs on server comput- 
ers. The mail service system includes an irvjex server 
for storing mail messages in message fies, and for stor- 
ing a full-text index of the mai messages. In addition, 
the system tndudes means for accessing the mail mes- 
sages by the plurality of client conputers by searching 
the full-text index using queries. 
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Description 

FIELD OF JhE INVENTION 

The present invenhon relates generally to electroric 5 
mail, and nnore particularty to electronic mail messaging 
in a distributed conputer system. 

BACKGROUND OF THE INVENTION 

10 

With the advervt of large scale distributed computer 
systems such as the Internet, the amount of information 
which has become available to users of coniputer sys- 
tems has exploded. Among this information is electronic 
mail (e-mail). With the inrprcvements in means ibr com- is 
posing and distrbuting written messages, the anxxjnt of 
e-mail traffic on the Internet has surged, ft is not unusual 
for an active Intemet user to be exposed to tens of thou- 
sands of e-mail messages a year. 

As an advarrtage. the Intemet allows users to inter- 20 
change useful information in a timely and convenient 
nfwrner. Howo/er, keeping track of tNs huge amount of 
information has become a problent As an additional 
advantage, the Internet now allows users to exchange 
information in a number of differerrt presentation modal- 25 
ities, such as text, audio, and still and moving imagea 
Adapting e-mail systems to organize such complex 
information, and presiding efficient means to coherently 
retrieve the rrTformatx}n is not trivial. 

As a disadvantaga Internet users may receive so 
junk-mail whenever they send to mailing lists or engage 
in news groups. There are numerous reported inckJents 
where specific users have been ovemvhelmed t>y thou- 
sands of unwanted mail messages. Current filtering 
systen« are inadequate to deal virith this deluge. 35 

Known distributed systems for composing and 
accessing e-mail are typically built around protocds 
such as IMAP. POP, or SMTP Typically, users must 
install conpatible user agent software on any ctient 
computers where the mail service is going to be 40 
accessed. Often, a significant amount of state informs- 
tk}n is maintained in the users* client computera For 
example, it is not unusual to store the entire rnrnl data- 
base for a particular user in his desk-top or lap-top com- 
puter. Normally, the users explictlly organize mad 4S 
messages into siisject foklers. Accessing mail generally 
involves shipping entire messages over the network to 
the ctient computer. 

Such systems are deficient in a number of ways. 
Most computers that a user wil encounter will not be so 
configured with user agents compatft)le with the user's 
mail service Often, a user's state is c^3tured in a spe- 
cific client conputer which means that work cannot pro- 
ceed when the user moves to another computer. 
Managing large quantities of archival mail messages by 55 
an explicit fokJer organization is difficult for most users. 
Accessing mail over a low bandwidth network tends to 
be uns a ti s factory. 



Therefore, it is desired to pro/ide a mail system ttiat 
overcomes these def kaencies. 

SUMMARY OF THE INVENTION 

The inventk}n. in its broad form, resides in an elec- 
tronic type distributed mail system, as recited in claim 1. 

Described hereinafter is a distrixrted nrtail system 
where a plurality of client computers are connected to 
each ottier and a mail service system via a network. 
Each client oonrputer is configured to execute client mail 
applicatk)n programs. The mail service system is for 
executing server mail programs on server corrputers. 
The mail service system includes an index server for 
storing mail messages in message files, and for storing 
a full-text index of the mail messages tn addition, the 
system includes means for accessing the mail mes- 
sages by the plurality of dient computers by searching 
the full-text index using queries. 

BREF DESCRIPTION OF THE DRAWINGS 

A more detailed understarvjing of the invention may 
t>e had from the fblk>wing description of a preferred 
embodiment given by way of example, and to be under- 
stood with reference to the accompanying drawing 
wherein: 

♦ Figure 1 is a block diagram of an arrangement of a 
distrixited mail service system which uses the 
invention; 

♦ Figure 2 is a block diagram of a mail service system 
of ttie arrangemerrt of Figure 1 ; 

♦ Figure 3 is a bkxk diagram of an account manager 
and account records of the system of Figure 2; 

♦ Rgure 4 is a block diagram of message and log files 
maintained by the system of Figure 2; 

♦ Figure 5 is a fkMr diagram of a parsing scheme 
used for mail messages processed by the system of 
Rgure 2: 

♦ Rgure 6 is a block diagram of a full-text index for the 
message files of Figure 4: 

♦ Rgure 7 is a diagram of a labeled message: 

♦ Rgure 8 is a diagram of an address book entry: 

♦ Rgure 9 is a f kaw diagram for filtering queries: arKt 

♦ Rgure 1 0 is a bkxk diagram for a MIME filter. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

System Overview 

In Rgure 1 , an arrangemerrt 100 provides a distrib- 
uted mail service having features embodyir>g the inven- 
tion. In Figure 1, one or more client computers 111-113 
are connected via a network 120 to a mail service sys- 
tem 200 described in greater detail below. 
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Client Computers 

The dient oonputers 11 M 13 can be wort^ations. 
PCs. Iap-top8. paln>top8. network conrputers (NCe). or 
any other sinnilar configured ooniputer system. The di- s 
ents 111-113 can be owned. bonx)wed, or rented, tt 
should be noted ttiat in practice, the clients 1 1 1 -1 13 can 
potentially be any of the millions of personal conputer 
systems that are cfurrently extant and connected to a 
n6tM>rk. Over time, a user may use diffiarent dient com* io 
puters at different locations. 

As shown Ibr computer ill. each dient computer 
executes standard operating system software (O/S) 
114. e g., UNIX®. WindowsSS®. MacOS® or NT®. The 
0/S 114 is used to execute application software pro- is 
grams. One of the application programs which can exe- 
cute on the dient 1 10 is a Web browser 1 15. The Web 
browser 115 can be Netscape Navigator®, Microsoft 
Fxplorer®. Hot Java®, and other similar browsers. 

The functionality of the browser 115 can be 20 
extended by forms, applets, and plug-ins generally irKli- 
cated by reference numeral 116. In the preferred 
embodiment, the browser extensiorv are in the form of 
client mal application programs described in greater 
detail below. The dient mail application programs are ss 
downloaded over the network 120 from the mail service 
system 200. The exter^ons can be implemented using 
HTML. JavaSatpt. Java applets. Microsoft® ActiveX, or 
combinations thereof to provided maximum portability. 

As shown for computer 1 1 2. the dient indudes one 30 
or more processors (P) 117. memories 118 (M). 
inputAcxitput interfaces (I/O) 119 connected to each 
other by a bus 120. The processors 1 17 can inplement 
CISC or RISC architectures in 32, 64, or other bit length 
data structures. The memories 118 can include solid 35 
state dynamic random access memory (ORAM), and 
fixed and renrKvable memories such as hard disk drives. 
CD-ROMs, diskettes, and tapes The 1/0 119 can be 
connected to input devices such as a keytx>ard and a 
mouse, and output devices such as a display and a 40 
printer. The I/O 1 19 can also be configured to connect 
to multi-media devices such as sound-cards, image 
processors, and the tike. The I/O also provides the nec- 
essary communicatrans linte to the network 120. 

46 

Network 

In the prefeaed embodiment the network 120 
indudes a large number of public access points, and 
communications are carried out using Internet Proto- so 
cols (IP). Internet protocols are widely recognized as a 
standard way of communicating data. Higher levei pro- 
tocols, such as HTTP and FTP. communicate at the 
application layer, while lower level protocds, such as 
TCP/IP operate at the transport and network levels. 55 

Part of the internet indudes a data exchange inter- 
face called the World-Wide-Web. or the y^eb" for short. 
The Web provides a way for formatting, communicating. 



inter-connecting, and addressing data according to 
standards recognized by a large mvrber of software 
packages. For example, using the Web, muhi -media 
(text, audio, and video) data can be arranged as Web 
pages. The Web pages can be located by the browser 
1 15 using Uniform Resource Locators (URLs). 

A URLs specifies the exact location of a Web-based 
resource such as a server or data record. The location 
can indude domain, server, user. fHe. and record Infor- 
mation, e.g,. HTTPy/www.dig- 
ital.conV-userid^ile.html/-record'' An Internet sen^ice 
can be used to send and receive mail messages. For 
example, a mai message can be sem mail to the 
address "jones®mail.digital.com" using the SMTP pro- 
tocol. As an advantage, the Internet and tiie Web allow 
users, with only mirx>r practical limitat»ns. to exchange 
data no matter where they are using any type of corrpu- 
ter equipment 

IntTBnet 

The mail service system 200 Indudes one or more 
senrer computers. Usually, the system 200 is part of 
some private network (Intranet) connected to the public 
network 120. Typically, an Intranet is a distrbuted com- 
puter system operated by some private entity for a 
selected user base, for example, a corporate network a 
gcvemmerrt network or some commerdal network 

Hrewatt 

In order to provide security protection, communica- 
tions t>etween conponents of the Internet and the 
intranet are frequently filtered and corrtrdled by a fire- 
Mil 130. The purpose of the firewall 130 is to errforce 
security policies of the private intranet One such policy 
may be "never allow a dierrt computer to directly con- 
nect to an intranet server via the public portion of the 
Internet." The firewall, In parts, protects accesses to 
aitical resources (servers and data) of the intranet. 

Only certain types of data traffic are allowed to 
aoss the firewall 130. Penetratkxi of the firewall 130 is 
achieved by a tunnel 131. The tunnel 131 typically per- 
forms a secure chaltenge-and-response sequence 
before access is allowed. Once the iderrtity of a user of 
a dient has been authenticated, the communications 
with components of the intranet are performed via a 
proxy server, not shown, using secure protocds such 
SSL and X.509 certificates. 

Man Service System 

The mail service system 200 can be implemented 
as one or more server computers connected to eadi 
other either locally, or over large geographies. A server 
computer, as the name implies, is configured to execute 
server software programs on behalf of dient computers 
111-113. Sometimes, the term 'server" can mean tiie 
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hardware, the software, or both because the software 
programs may dynamically be assigned to different 
servers computers depending on load corKf tions. Serv- 
ers typically maintain large centralized data repositories 
for many users. s 

In the mail system 200. the sen/ers are configured 
to maintain user acoounts^ to receive, filter, and organ- 
ize mail messages so that they can readily be located 
arxj retri9/ed. no matter fyon the Infamation in the mes- 
sages is encoded. io 

General Operation 

During operation of \h& arrangement 100. users of 
the dient corrfsuters 111-112 desire to perform e-mail is 
sen/ices. These activities typically indude composing, 
reading, and organizing e-mail messages Therefore, 
the dient computers can make connections to the net- 
work 120 using a public Internet sen^ice provider (ISP) 
such as AT&T or Earthlink. Altematively. a client oompu- 20 
ter can be connected to the Internet at a "cyfoer-cafe" 
such as Cybersmith. or the intranet itself via a local area 
network. Many other connection mechanism can also 
be used. 

Once a connectk}n has been made, a user can per- 2s 
form any mail service 

As an advantage, structural and functk}nal charac- 
teristics of the arrangement 100 include the folkywing. 
Mail services of the system 200 are available through 
any Web-connected dient computer. The users of the 30 
services can be totally mobile, moving among different 
dients at will during any of the mail activTtje& Conposi- 
tion of a mail message can be started on one dient 
conTpleted on another, and sent from a yet arxsther com- 
puter. 3S 

These characteristics are attained, in part, by n^er 
locking a user's state in one of the dient computers in 
case access is not be possible at a later time. This has 
the added benefit that a dient computer's local storage 
does not need to be backed-qp because none of the 4(? 
important data reside there. In essence, this is based on 
the notion that the operating platform is the Web. thus 
access to mail service system via the Web is sufficient 
to access user data 

The service system will work adequately over a 4S 
wide range of connectivity bandwidths. even fa mad 
messages induing data in the form of mutti-media. 
Message retrieval from a Ivge repository is done using 
queries of full-text index without require a complex das- 
srficatton schema so 

The anangement 100 is designed to incorporate 
redundancy technk^ues such as multiple access paths, 
and replicated files using redundant mays of independ- 
ent disks (RAID) technok>gies. 

55 

Mall Service System 

As shown in Rgure 2. the mail service system 



includes the foftowing components. The system 200 is 
constructed to have as a front-end a Web server 210. 
The sender 2 1 0 can be the 'Apache" Web server availa- 
ble from the WWW Consortium. The Web senm 210 
interacts with a back-end common gateway interface 
(CGI) programs 220. The progranrv interface with an 
account manager 300. a STMP mad server 240. and an 
index server 250. The CO! programs 220 are one possi- 
ble med^sm. The programs couti also be imple- 
mented by adding the code directly to the Web sen^ 
210. or by adding extensions to the NSAPt from Net- 
scape. 

The top-level functions of the system 200 include 
send mail 241. receive mail 242. query \ndex 243. 
add/remove label toArom mail 244. and retrieve mail 
245. Different servers can be used for the processes 
whch implement the functksns 241 -245. 

The account manager 300 maintains account infer- 
matkxi. The mail server 240 is used to send and receive 
mail messages to and from other sen/ers connected to 
the network. The index server 250 maintains mail mes- 
sages in message fies 400. and a fuil-text index 500 to 
messages. The CGI programs 220 also interact with the 
messages files 400 via a fiter 280 for mad message 
retrieval. 

The Web server 210 can be any starxlard Web 
server that implements the appropriate protocols to 
oommunk:ate via the netwak using HTTP protocols 
201 . for exanple the Apache sender. The CGI back-end 
programs 220 route transactions between the Web 
server 210 and the operational components of the mail 
sen/ice system. The CGI back-end 220 can be irrple- 
mented as C and TCL programs executing on the serv- 
ers 

Account Manager 

As Shown in Figure 3, the account manager 300 
maintains account Information 301-303 for users who 
are allowed to have access to the mail system 200. 
Information maintained for each account can indude: 
mail-box address 310. eg., in the form of a "Post Office 
Protocol (POP-3) address, user password 320, label 
state 330, named queries 340. fitter queries 350, query 
positfon information 360, user preferences 370, and 
saved composition states 380. The fidl meaning and 
use of the account information win be come apparent as 
other components of the system 200 are described. 

As an introduction, passwords 320 are used to 
authenticate users. Labels 330 are used to organize 
and retrieve mail messages. Labels can be litoned to 
annotated notes that can be added and removed to 
messages over their lifetimes, in other words labels are 
mutable. Labels help users organize their messages 
into subject areas. At any one time, the label state cap- 
tures ail labels that are active for a particular user. 
Labels will be described in greater detail below. 

In the system 200. mail messages are accessed by 
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using queriee. This is in contrast to exptidlty spedfying 
sut)ject folders as are used in many known mail sys- 
tems. A query is composed one or more search terms, 
pertiaps connected by logical operators, that can be 
used to retrieve messages. By specifying the name dt a 
query, a user can easily retrieve messages related to a 
particular topic, phrase, date, sender, etc. Named que- 
ries 340 are stored as part of the account information. 

Some queries can be designated as Titter' queries 
340. This alkMvs a user to saeen, fa example, "junk 
mail." conrrmonty known as spam. Filter queries can 
also be used to pre-sort messages received from partic- 
ular mailing lists. Query position information records 
which message the user last selected with a query This 
way the user interfiace can position the display of mes- 
sages with respect to the selected message when the 
query is reissued. User preferences 370 specify the 
appearance and functioning of the user Interface to the 
mail service as inrplemented by the extended browser 
1 16 of Figure 1. Saved composition states 380 alksw a 
user to compose and send a nrtessage using several dif- 
ferent client computers while preparing the message. 

The account manager 300 can generate a new 
account, or deleted an existing account:. The account is 
generated for a user by specifying the user name and 
password. Once a skeletal account has been gener- 
ated, the user can supply the remaining Information 
such as labels, named queries, filter queries, and so 
forth. 

Mail Server 

Nk»v with continued reference to Rgure 2. the waJH 
sender system 200 receives (242) new mail messages 
by conmuntcating with the mail server 240 using the 
POP-3 protocol. Mail messages are sent (241) using 
the SMTP protocol. The appropriate routing information 
In the mail server 240 for a particular user can be gen- 
erated after the user's account has been generated. A 
TOP Account Name" shouM be specified as the user's 
name. In most systems, the name will be case sensrtive. 
The "POP Host* shouki be the Internet domain name of 
the mail server 240. Here, the case of the letters is 
Ignored. An IP address such as "16.4.0.16" can be 
used, although the domain name is preferred. In some 
cases, a particular user's preferred Internet e-mad 
address may be urvelated to the POP Account Name, 
or the POP Host The maa sen/er 240 is connected to 
the Internet by Unk 249. 

The rapid expansion in the amount of information 
which is now available orvline has made it much more 
diff kxilt to kx:ate pertinent Information. The question 'in 
which folder did I store that message?,** becomes rmre 
difficult to answer if the nurTt)ef of messages that one 
would like to save increases over long time perkxjs to 
many thousands. The importance and frequency of 
accessed messages can vary. 

Tracflionally. the solution has been to structure the 



mail messages In a hierarchical manner, e.g. files, fokj- 
ers. sub-toklers. sub-sub-tolders. etc. However, rt Ivis 
been recognized that such structures do not scale eas- 
ily because filing strategies are not consistent over time. 

5 Many users find that hierarchk:al structures are inade- 
quate for sut>stantial quantities of e-mail messages 
accumulated over many years. Partfoularly, since the 
meaning and relation of messages changes over time. 
Most systenrc with an explicit fifing strategy require con- 

10 stent and tedfous atterrtion to keep the hierarchical 
ordering consistent with cun'ent needs. 

Message Repository 

75 Messages are stored in message files 400 and a 
full-text index. The organizatfon of the message faes is 
first descrbed. This is folfowed by a description of the 
full-text index 500. As a feature of the present inventfon. 
user interaction with the mail messages is primarily by 

20 queries performed on the full-text index SOO. 

As shown in Figure 4, the index server 250 assigns 
each received message 401 -402. a unk)ue kjentification 
(MsgID) 410. The MsglD 410 is oonposed of a ffle iden- 
tification (RIelO) 411, and a message number (Msg- 

26 Num) 412. The RIelD "names." or is a pointer to a 
specific message file 420. and the MsgNum is oome 
artMtrary numbering of messages in a fie. e.g.. an index 
into the file 420. 

A message never changes after it has been filed. 

30 Also, the MsgID 410 forever Identifies the same mes- 
sage, and is the only ID for the message. In the refer- 
enced message fie 240. a message entry 430 Indudes 
the MsgNum stored at fiefo 431, lidbels 432, and the 
content of the message itself in f ieti 433. 

35 The nurrfoer of separate files 240 that are main- 
tained for storing messages can depend on the design 
of the underlying file system and specifc Inplementa- 
tion details For example, the size and number of entries 
of a particular file may be limited by the file system. 

40 Also, having multiple files may facilitate file maintenance 
functions such as back-up and restore. 

Label Log 

46 Although a message may never change, the set of 
labels associated with a message may change. 
Because labels can change, a transaction log 440 is 
also maintained. The log 440 Indudes "add* entries 
(+label) 450. and "renrxve" entries (-^abeO 460. Each 

so entry includes the MsglD 451 or 453 of the effected 
message entry, and IdOel that is being added (452) or 
deleted (453). The contents of the log 440 are occasion- 
ally merged with the message ties 240. Merged entries 
are removed from the fog 440. The label log 440 allows 

55 for the mutation of l^ls attached to data records such 
as mail messages, where the labels and the data which 
are labeled are stored in the same index. 
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Full-Text Index 

Figures 5 and 6 show how the index server 250 
generates the fuH-text index 500. Newly receh/ed mail 
messages are processed in t)atches 403^4. Mes- 
sages 401 and 402 of a batch are parsed into individual 
words 510. A botch 403 in a large mail service system 
may include hirdreds or thousands of messages. TTie 
words of the messages are parsed in the order that they 
are received in a batch. Each word is artxtrarily 
assigned a sequential location number 520. 

For example, the very first word of the very first 
message of the very first batch is assigned location "1 
the next word location "2,* and the last word location '3.' 
The first word of the next message is assigned the next 
sequential location "4,' and so forth. Once a location 
has been assigned to a word, the assignment nwer 
changes, tf the location is expressed as a 64 bit number, 
then ft is extremely unlikely that there will ever be an 
overlap on locations. 

As the nriessages are parsed, the indexing process 
generates additional "metawords" 530. For exanple. an 
end^-message (eom) metaword is generated for the 
last word of each message The metawords are 
assigned the same locations as the words which trig- 
gered their generation. In the example shown, the loca- 
tion of the first eom metaword is *3/ and the seoond is 
"5.- 

Other parts of the message, such as the To," 
"Rom, - "Subject." and 'Date" fields may generate other 
distinctive metavKXXte to help organize the full-text index 
500. Metawords help feicilrtate searches of the index. 
Metawords are appended with predetermined charac- 
ters so that there is no charKe that a metaword will ever 
be confused with an actual parsed word. For example, 
metawords include characters such as "space" which 
are never allowed in words. Hereinafter, the tenn 
"words' means both actual words and synthesized 
metaword& 

After a batch of messages have been parsed, the 
words and their assigned locations are sorted 540. first 
according to the collating order of the words, and sec- 
ond according their sequential locations. For example, 
the word '*me" appears at locations "3" and "5** as shown 
inbox550. The sorted batch 550 of words and locations 
Is used to generate the index Each sorted batch 550 is 
merged into the index 500, initially enrpty. 

Index Structure 

Figure 6 shows the logical structire of an index 600 
according to the pret^ed embodiment. The index 
includes a plurality of word entries610. Each word entry 
610 is associated with a unique "word." that appeared at 
least once in some indexed message. The term \irord" 
is used very loosely here, since the parsing of the words 
in practice depends on which marta/characters are 
used as word separators. Words do not need to be real 



words that can be found in a dictionary. Separators can 

be spacing and punctuation marte. 

The indexer 250 will parse anytNng in a message 

that can be identified as a distinct set of characters 
5 delineated by word separators. Dates are also parsed 

and placed in the index Dates are indexed so that 

searches on date ranges are posable. In an active 

index there nnay well be millions of different words. 

Therefore, in actual practice, conpressicn techniques 
w are extensively used to keep the files to a reasonably 

size, and allow updating of the index 500 as rt is bevig 

used. 

The word entries 610 are stored in the collating 
order of the words. The word is stored in a word field 

15 61 1 of the entry 610. The word field 61 1 is followed by 
location fields (Iocs) 612. There is one location field 612 
for every occun-ence of tine word 611. As described in 
the Burrows reference, the locations are actually stored 
as a sequence of delta-values to reduce storage. The 

20 index 600 is fully populated. This means tiie last byte 
614 of the last location field of a word is immediately fol- 
lowed by the first byte 615 of the next word field. 

Labels 

2S 

Labels provide a way for users to annotate mail 
messages. Attaching a label to a message is similar to 
affixing a note to a printed document. Labels can be 
used to replace the folder mechanisms used by many 

30 prior art mail systems. However a single mail message 
can be annotated with multiple labels. This oorrpares 
favorcribly to folder-based systems where a message 
can only be stored in a single folder. 

Users can define a set of labels wrtti which to work 

35 The labels are nothing more than predefined text 
strings. The cun-entiy active set of labels for a particular 
user. e.g. the label state 330 of Figure 3, is maintained 
by the account manager 300 and is displayed in a win- 
dow of tiie graphical user interface. Utieks can be 

40 added and removed by tiie system or by users. 

As shown in Figure 6. Idoels are stored in a data 
structure 650 that parallels and extends the functionality 
of full-text index 500. Labels are subject to ttie same 
constraints as index words. Also, queries on the ful-text 

45 index 500 can contain t^doels. as well as words, as 
search tenns. A label is added to a mail message by 
adding a specrfic index location (or locations) within the 
message to the set of locations referred to by ttie spec- 
ified label. Label removal is the opposite. Operations on 

so labels are much nuxe efficient than other operations 
ttiat mutate the state of ttie full-text index. 

The on-disk data structure for ttie label indoc 650 
ttiat represents ttie label state 320 is ttie same as that 
de5cri}ed for index word entries 600. This means that 

55 ttie lat>el state can be thought of as an extension of ttie 
futl-text index 500. Accordingly, ttie label index exten- 
sion. Kke ttie index 500. maps labels (^ds) 651 to 
sequences of index locations 652. 
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Although the structured fbrmats of the label exten- 
sion 650 and the fuH-text index 500 are the same, tor 
efficiency reasons, the label portion of the index is man- 
aged by a software corrponent that is distinct from the 
software that manages the fui-text index 500. tf a term 5 
of a query string is found to be a label, then the label 
index 650 is searched to provide the necessary location 
mapping. This mapping Is futher modified by the label 
log 440 that contains all recent label mutations (addi- 
tions or renx3v^s). The label log 440 can include an in- 70 
memory version 660. Since operations on this structure 
are in-memory updates fa recent label mutations 660 
can be relatively fast while the updating of the label 
index 650 can take place in bad^ound. 

As shown in Figure 7, a message 700 includes a is 
header 701 and a body The header 701 typically 
includes the To". Trom". "Date" and "Subject" fields. 
The header may also include routing irribmiation. The 
body 702 is the text of the mail message. 

Each mail message can initially receive two labels. 20 
"inboK" 710 and "unread" 720. Messages labeled as 
"unread** 720 have not yet been exposed tor readbig. 
Messages with the "inbcx" label 710 are deemed to 
require the user's attention. As win be described below, 
it possible for messages to be labeled as unread but not 3S 
have the inbox label. These less important messages 
can be read by the user as needed. 

Outputting. e.g., displaying or printing, a message 
removes the unread l«tf)el 720 under the assumption 
that it has been read. A user can explicitly add or 50 
remove the unread label. A message can be deleted by 
attaching a 'delete* label 730. This has the effect that 
the message will not been seen again because mes- 
sages labeled as deleted are normally excluded during 
searches. Removing the deleted label has the effect of 35 
"undeleting" a message 

Although a preferred embodiment uses labels for 
data records that are mail messages, it should be 
undefstood that "mutable" labels can also be used tor 
other types of data records. For exfiunple, labels which 40 
can be added and removed can be used with data 
records such as Webisages. or news group notes. The 
key feature here is that labels are indexed in the same 
index as the record which they label, and that labels can 
be added and removed. 46 

Queries 

After e-mail messages have been indexed and 
labeled, the messages can be retrieved by issuing full- so 
text queries. A query searches for messages that match 
on words and labels specified in the query. This is in 
contrast with known mail systems where users access 
mail by remefTt)ering in which file, fotier, or sut>-folder 
messages have been placed so the folder can be 55 
searched. As an advantage of the present system, 
users only need to recall some words and labels to find 
matching messages. 



The syntax of the query language is similar as 
deschbed in the Burrows reference. A query includes a 
sequence cf primitive query tenns. combined by opera- 
tors such as "and," "or." "not" "near." and so forth. A 
primitive term can be a sequence of alpha-numeric 
characters, i.e.. a "word," without punctuation marte. tf 
the terms are enclosed without quotation marte (*). the 
search is for an exact match on the quoted string. A 
temi can be a label. A term such as "from.-fred" 
searches fa messages with the word Ired" in the "from" 
field of a message header. Similar queries can be for- 
mulated for the "ta' ''from, "oc." and "subjecT fietis of 
the header 

A term such as '11/2/96-25/Dec/96" searches for 
alt messages in the specified date range. The parsing of 
dates is flexisle, e.g., 12/25/96, 25/12/1996, and 
Oec/25/96 all mean the same date. In case of ambiguity 
(2/1/96). the European order (day/month) is assumed. 

During normal operation, the CGI program 220 
modifies each issued query by €V)pending a term whk:h 
axdudes the "deleted" label, e.g.. "and not deleted." 
This has the effect of hkling all deleted messages from 
the user of the dient There is an option in the user inter- 
face wtttch inhibits this effect to make deleted messages 
visible. 

Named Queries 

Queries can be 'named." Named queries are man- 
tained by the account manager 300. By specifying the 
name of a query, useirs can quickly perform a search for 
e-mai messages including frequently used terms. 
Users can compose complex queries to natch on some 
pattern in indexed messages, perhaps intermixing con- 
ditfons about messages having particular text or IsMs. 
and to keep the query for sutjsequent use. 

Named queries can be viewed as a way for replac- 
ing prior art subject folders. Instead of statically organiz- 
ing messages into folders according to predetemiined 
conditkxis, queries allow the user to retrieve a specific 
collectton of messages depending on a current set of 
search terms. In other words, the conditions which 
define the collection are dynamically expressed as a 
query. 

History List 

Recently performed queries are kept in a "history" 
list. Aocordirigly, frequently performed queries can read- 
ily be re-issued, for example, when the index has been 
changed because of newly received maO. or because of 
actions taken by other client computers. 

Dynamic Address Book 

Queries can also be used to perform the function of 
pria art "address books." In many Irawn e-mail sys- 
tems, users keep address boote of frequently used 
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addresses. From time to tirrie. users can add and 
rerTK>/e addresses. There, the address books are stati- 
cally maintained as separate data structures or address 
book files. For exanple. there can be "personaT and 
"public' related address books. In contra^, here, there 
is no separately stored address book. Instead, an 
"address book" is dyn^TncaOy generated as it is 
needed. The dynamic address book is generated from 
the files 400 and the full-text index 500 as fottows. 

As shoMm in Figure 8, a user of a cSent computer 
820 can generate address book type infonnation using 
a form 800 supplied by one of the dient mail application 
programs 116. The form 800 includes, for example, 
errtry fields 801-803 for address related information 
such as name, phone number, (hard-copy) malt 
address, and (soft-copy) e-mail address, arxl so forth. 
Alternatively, address jn1brmatk)n can be selected from 
a prior received mail message 805 by cficking on appro- 
priate fields in the header a body of the message 805. 

From the perspective of the mail sen^ice system 
200 and the index sender 250, the address book infbr- 
matk)n is handled exactly as a received mail message. 
This means that, for exarnple. thedataof thefiekis801- 
803 are combined into an "address book" mail message 
810. An 'address' \Bbe\ 809 can also be added to the 
entry uang the labeling convention as described herein. 
The address book maa message 810 and label 809 can 
be stored In one of the message files 400. AddltkxYany 
the message 810 can be parsed and inserted into the 
full-text index 500 as are the words and tat)els of any 
other mail message. In other words, the address Infor- 
matk)n of form 800 is merged and blended with the fiil- 
text index 500. 

After the address information has been faed and 
indexed, the address information can be retrieved by the 
user of the client corrputer 820 composing a query 830 
using the standard query interface, with pertiaps. the 
l^bel 'address** as one of the query terms. The exact 
content to be retrieved is determined at the time that the 
terms and operator of the query 830 are composed by 
the user. The address information, i.e.. one or more 
address book mail messages, which satisfies the query 
is retumed to the client computer 820 as the dynamic 
address book 840. The user can then select one of the 
addresses as a lo" address for a new. reply, or (award 
mail message. 

Message Resemblance 

It is also possft>le to search tor messages whk:h 
resemble a cun^ently selected message. In this case a 
document resemblance technique can be used. Such a 
technique allows a user to find ail messages whk;h 
closely relate to each other. 

Sorting Search ResuKs 

When a search for an issued query conrpletes. the 



results of the search are presented in an order accord- 
ing to their MessagelD 411, Figure 4. In practice, this 
means that qualifying messages are presented in the 
temporal order of when the messages were received. 
5 Mo^ prior art e-mail systems alkMv other sort 
orders, such as by sender, or by message thread (a 
sequence of related messages). There is no need for 
such capabilities here. Consider the foiowing possibili- 
ties. 

10 Messages from a partk^ilar user can be specified 
by including in a query a term such as Iromijones." This 
will locate only messages from a particular user. >(bu 
can select messages of a particular Ihread* by using 
the 'view discussion' optk)n of the user interface 

15 descrbed below. As stated abcve, messages for a par- 
tk:ular date range can be specified in the query. 

Filtering Messages 

20 In onder to facilitate mail hancHing. particularty for 
someone receiving a large amount of e-mail, a user can 
configure the filter 280 to his or her own preferences as 
shown in Figure 9. A message filter is specified as one 
or more name fitter' queries 910. The nmed query 

25 910 is stored as part of the account informatkxi of Fig- 
ure 3. The named filter query 910 can be composed on 
a diem computer 920 using the client mail ^ication 
programs down-toaded from the mail sennce system 
200. 

30 New messages 930 received by the mai service 
system 200 are stored, parsed, and indexed in the mes- 
sage fHes 400 and full-text index 500 as descrbed 
above In addition, each new message 930 can be oonv 
pared with the named queries 910. If the content of a 

35 new message 930 does not match any of the named fil- 
ter queries 910. then the new message 930 is given the 
intXK label 710 and the unread label 720. i.a. tf« mes- 
sage is placed in the "In-box" 940 for the user's atten- 
tion. OthenMse. the new message 920 is only given the 

40 unread label 720. 

For example, mail which is sent out typicaly has a 
"from' fieki including the name of the sender. e.g.. 
'*Fronn: Jon Doe,' in the message header. Aitematively. 
the body of the mail message may include the tflxt '>bu 

45 are getting tfvs message from your good friend Jon 
Doe.' The user Jon Doe can set a named Her query 
"SentByME' as Trom near (Jon Doe)". This query will 
match any message which contains the word "from" 
near the word phrase "Jon Doe." The effect Is ffet users 

50 do not otplidtly become aware of messages tfiat match 
on the filter query 910. For example, a user miy want to 
titer messages which are *cc* copies to one self. A user 
may also desire to filter out junk e-mail messages aarv- 
ing from commercial e-mail distributors at knmn 

55 dornains, or pre-sort messages received via mailing 
lists. 
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Message Display Options 

Rom the user's perspective, access to the mal 
services is implemented by extensions to the Web 
browser, such as Java applets. Messages are normally s 
displayed by their primary component being transmitted 
to the dient in the HTML format, and being displayed in 
the Java applet's windcwv. The first line of a displayed 
message contains any "hot-links* which the user can 
click to display the message in one of the Web 10 
browser's windows, either with the HTML formatting, or 
as the original text uninterpreted by the system, tt 
should be noted, headers in Internet messages, 
depending on routing, can be quite lengthy. Therefore, it 
Is pos&i)le to restrict the view to just the *from.' la* is 
'oc; 'date." mi "subject" fields of the header. 

Embedded Unks 

When displaying retrieved messages, the system 20 
200 heuristicalty locates text strings which have the syn* 
tax of e-mail addresses. If the user dick on one of these 
addresses, then the system wil display a oompositfon 
window, described t)ekiw, so that the user can easily 
generate a reply message to the selected e-mail 2s 
address(es). 

Similarly, when displayirig retrieved messages, the 
system 200 heuristicaiy locates text strings that have 
the syntax of an URL, and makes the string a hot-link. 
When the user dicte on the hot<linK the URL is passed 30 
to the browser. whk:h will retri«re the contents over the 
network, and process the content in the normal manner. 
The system also attempts to detect conponents in mes- 
sages, such as explicitly "attached" or irrpteitly "embed- 
ded" fie& The fies can be in any number of possible 35 
formats. The content of these files are displayed by the 
browser 115. The specific display actksns used wil 
depend on how the browser is configured to respond to 
different conponent file formats. 

For some fie fonmate. for exanple GIF and JPEG. 40 
the conponent can directly be dsplayed. K is also pos- 
sble to configure the browser with a "helper" applet to 
"display" attached files having specific fonnat types as 
"icons." For example, the message may be in the form 
of an audio message, in which case, the message 46 
needs to be "said," and not displayed. For some mes- 
sage formats, the browser may store some of the con- 
tent in file system of the dient computer. 

LoiwBandwIdm Hitefing so 

Since the dient computers 111 -11 3 may access the 
mall servk:e system via low-barxiwidth network connec- 
tions, an attenrpt is made to minimize the amount of 
data that are sent from the mal service system to the ss 
client conputers. Even over high-speed comnrunica- 
tions channels, minimizing the amount of network traffic 
can improve user interactkins. 



Because the may sennce system 200 ;inows mail 
messages to indude attached or enrpedded multi- 
media files, mai messages can become quite large. In 
the prior art. the entire mai message, induded ties are 
typically shipped to the dient conputer. Thus, any part 
of the mail message can immediately be read by the 
user after the message has been received in the dient. 

As shown in Figure 10. the mad service system 200 
can recognize messages conrponents that are induded 
as such. The system 200 can cfiscover an eotpicitfy 
attached fie 1010 to a message 10OO, and the system 
200 can also heuristicalty discover textual components 
1021-1021 that are irrplidtiy enPedded without MIME 
structuring in the message For exanple. the system 
200 can recognize embedded 'uuenooded' endosures. 
base 64 endosures. Postscript (and PDF) documents. 
HTML pages, and MIME fragments. 

Accordingly, the system 200 is configured to "hdd- 
bad<' such components 1010. 1020-1021 encoded in 
different formats using a "MIME" fitter 1001. The 
attached and embedded comporrents are replaced by 
hot-links 1031 in a reduced size message 1030. Only 
when the user dicto on one of the hot-lkiks 1031 is the 
corrponents sent to the requesting dient conputer. 

Ctlem Computer User Interface 

The following sections described how the Web 
browser 115 is configured to provided the e-mai serv- 
ices of the system 200. The functions described can be 
displayed as pull-down menus, or as button bars 
depending on a desired appearance. Preferably, the 
functions are inplemented as Java applets. 

File Menu 

The file menu has the fdlowing options. Administra- 
tion. Preferences and Quit, tf the user dicks on the 
Administration option button, then the system 200 loads 
ttie system administrative page into the browser 116. 
Using the Administrative window, subject to access con- 
trols, the user can view and nxxlify accounts, and view 
the server log files. The preferences option is used to 
modify user preferences 370. Quit returns to the main 
log-in wind(Mv. 

Queries Menu 

This menu indudes the View Discussion. Name 
Current Query, Forget Named Query. Exclude "deleted" 
Message, and Vbur Query Optiona The View Discus- 
sion option issues a query for messages related to the 
currentiy selected message. Here, "related" means any 
messages which share approximately the same subject 
line. ancVor being in reply to such a message, a mes- 
sages linked by a common standard "RFC622" mes- 
sage ID. 

The Name Current Query allows a user to attach a 
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text string to the current query. This causes the system 
200 to place the query in the account for the user for 
subsequent use. The Forget Nanned Query option 
deletes a named query. 

The Excluded "deleted* message option omits from s 
a query result all messages that have the deleted label. 
This is the default option. Clicking on this option 
changes the behavior of the system 200 to include, in 
response to a query, "deleted" messages. The >tur 
framed Queries opton displays a particular user's set of w 
named queries 340. CticMng on any of the displayed 
names issues the query 

Labels Menu 

IS 

This menu includes the Record L^el. and Forget 
Label options. These option respectively allow tor the 
addition and removal of labels to and from the label 
state 330. 

20 

History Menu 

The diem keeps a history of, for exanple. the lest 
ten queries to allow for the reissue of queries. The 
options of this menu are Go Back. Redo Current Query. 25 
Go Fonward, and The History List. Go Back reissues the 
query preceding to the current query. Redo reissues the 
current query This option is useful to process mes- 
sages which have recently an-ived, or in the case where 
the user's actions have altered the messages files 400 30 
in some other manner. Go Forward reissues the query 
Ibllowing the cun-ent query. The History List displays all 
of the recently issued queries. Any query listed can be 
reissued by clicking on the query 

35 

Messages Menu 

Options here indude: Select All. Select Unread* 
Select Read. Mark As Unread. Mark As Read. Add 
Labels. Remove Labela and Use Built-in Viewer. The 40 
Select All option selects ae messages which match the 
cunrent query. The next two optkxis respectively select 
message that do not, and do hove the unread label. The 
following two options add and renDove labels label to 
currently selected messages. 4s 

The user interface rxxmalty displays a message by 
oonvertng the message to an HTML format and pre- 
senting it to an HTML viewer which can either be in the 
browser's main wirvlow, or with a tKiilt-in viewer. The 
last optkxi of tfie message memi selects the viewer. so 

Help Menu 

The help options can be used to display Informa- 
tional pages on how to use the various features of the ss 
system The help pages are down-loaded on demand 
into the client computer from the mail service system 
200. 



Main Window Menu Bar 

This menu bar indudes buttons tor the foilomg 
functions. The functkxts are er)at)led t>y cltddng on the 
button. 

Add: This button is used to add a selected label to 
annessage. 

Relabel: This button combines the functions of the 
unlak)el and add functions. 
Delete: With this button, a deleted label is added to 
a message 

Unlat)el: Used to remove a single label mentioned 

in a query from a message. 

Next: Selects a next message. 

Prev: Selects a preceding message. 

Newmatl: Issues a query for all message having the 

inbox label. 

Query: Presents a dialog to compose and issue a 
query 

Message Display Button Bar 

This button bar is used to peribrm the following 
functions. 

Detach: Generate a new top-l«/el window to display 
selected messages. 

Compose: Generate a window fbr composing new 
mail messages. 

FbnMard: This function sets up a window for com- 
posing a new message A selected message is 
attached to tfie new message. The attached mes- 
sages are forwarded without the need of down- 
loading the messages to the dient computer 
Reply To All: This function sets up a window for 
composing a new message with the same recipi- 
ents as those in a selected message. 
Reply To Sender: Set up a window for composing a 
new message to the sender of a selected niessage. 

Composition Window 

Access to the composition window is gained by 
dicking on the Compose, Fonvard, Reply, or Modify 
button, or by diddng on a "mail-to" hot link in a dis- 
played message. Compose begins a new message, for- 
ward is used to send a previously received message to 
someone else, reply is to respond to a message, and 
modify atkyws on to charige a nnessage which has not 
yet been sent The mail sendee allcNvs a user to com- 
pose multiple messages at a tima 

The text of a message is typed in using an available 
oomposition wirxkwv, or generating a window if none are 
available. The exact form of the typing area of the com- 
position window depends on the nature of the window- 
ing system used on a particular dient conputer. 
Typically* while typing the user can use short-cuts for 
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editing actions such as cut paste, copy, delete, undo, 
and so forth. 

Text portions from another message can be 
inserted by using the Insert Msg, oc Quote Msg but- 
tons. If an entire message is to be induded. then the s 
Forward button should be used. The message will not 
actually be posted until the send function is selected. 
While the message is being composed, it is periodically 
saved by the mail system. Thus, a composition session 
started using one dient conputer in an office, can easily 10 
be completed some time later using another conputer. 

Send: Sentis a message. Any attachments are 
included before serxjing the message. The user is 
nortified d invalid redpients by a status message, is 
and editing of the message can continue. Other- 
wise, the window is switched to read-only mode 
Close: After a message has been sent, or the dis- 
card button is dicked. this button replaces the send 
button to aibw one to dose the composition win- so 
dow. 

Discard: This button is used to discard the message 
being composed, and switches the window to read- 
only. A user can then click the ck>se or nrxxjify but- 
tons 26 
Modify: After a message has been successfully 
sem. or if the discard button has been dicked. this 
button appears in place of the discard button to 
allow the user to compose another message 
derived from the cunent message. 30 
Wrap: This function Is used to limit the nunt>er of 
characters on any one line to ei^ity. as requred by 
some mailing systems. 

Insert Msg: Replace the selected text with dis- 
played text from a selected message 35 
Quote Msg: Replace the selected text with dis- 
played text from a selected message so that each 
line is preceded by the character. 

Having described a preferred embodiment of th»e 4o 
invention, it will now become apparent to one skilled in 
the art that other emtx)diment8 incorporating its con- 
cepts may be used to be within the scope of the inven- 
tion 

46 

Claims 

1 . An electronic type didrftxjted mail system, conrpris- 
ing: 

50 

a plurality of dient conputers for connecting to 
a network, each dient computer being config- 
ured tor executing dient mail application pro- 
grams; 

a mail servce system connected to tiie n^- 55 
work for executing server mail programs on 
server conputers, the mail service system 
inctuding 



an index server for storing mai messages in 
message ties, and for storing a fUl-text index 
of the mail messages; and 
means for accessing the mail messages by the 
plurdity of client computers by searching the 
full-text index using queries. 

2. The distributed mail system of daim 1 , wherein the 
diem mail application programs are down-foaded 
from the mad service system over the network. 

3. The distributed mail system of daim 1 . wherein the 
query indudes: 

terms, selected of the terms to be connected 
by operators; ard 

means for corrposing a query using the dient 
applicatfon mail programs; and 
searching the full-text index to k)cate maS mes- 
sages which satisfy the terms and operators of 
the query. 

4. The distrttxjted mail system of daim 1. wherein a 
state of operation of each client computer access- 
ing mail messages via the network is maintained m 
the mail service system. 

5. The distributed mail system of daim 1 , wherein the 
mail senrice system further indudes: 

a front-end Web server connected to the net- 
work; and 

back-end irrterface programs connected to the 
front-end Web server, the back-end interface 
programs connected to an account manager, a 
mail server 240, and the index server. 

6. The distributed mail system of daim 5. wherein the 
account manager maintains for each user of tfie 
dstibuted mail system account infomiation. the 
account information including a mail-box address 
for the user, a user password, a label state, named 
queries, filter queries, query position information, 
user preferences, and saved composition states, 
furtiier including: 

means for adding labels to the mail messages; 
means for removing labels from the mail mes- 
sages: arxl 

means for storing tfie labels in the full-text 
index. 

7. The disti^itxjted nxaJA system of daim 6, wherein the 
label state of a particular user irtdudes a set of 
labels t)eing used tsy the partiodar user, and the 
label state is down-foaded from the account infor- 
mation to a particular dient conputer operated by 
the particular user while connected to the mail serv- 
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ice system. 

8. The cfistributed mail system of daim 6. wherein a 
particular user corrposes queries using the client 
mail application programs. s 

9. The dislritxjted mail system of daim 6 further 
inducting: 

means for naming the queries, and means tor io 
down-loading the named queries from the 
account Information to a particuiar dient com- 
puter operated by the particular user while con- 
nected to the mail service system. 

IS 

10. The distributed mail system of daim 1, wherein the 
client application mail programs execute from within 
a browser program of the dient conrputer. 

20 
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